Skip to content

feat: Update OpenAI graph runner to return AgentGraphRunnerResult with GraphMetrics#155

Draft
jsonbailey wants to merge 20 commits intojb/aic-2174/graph-tracking-refactorfrom
jb/aic-2174/openai-graph-runner
Draft

feat: Update OpenAI graph runner to return AgentGraphRunnerResult with GraphMetrics#155
jsonbailey wants to merge 20 commits intojb/aic-2174/graph-tracking-refactorfrom
jb/aic-2174/openai-graph-runner

Conversation

@jsonbailey
Copy link
Copy Markdown
Contributor

Summary

  • Removes all direct LaunchDarkly tracker calls from OpenAIAgentGraphRunner
  • Introduces _NodeMetricsAccumulator — a lightweight per-node metrics collector replacing LDAIConfigTracker inside the runner
  • Runner now returns AgentGraphRunnerResult with populated GraphMetrics (path, duration_ms, usage, node_metrics)
  • Graph-level and per-node tracking events are emitted by ManagedAgentGraph._flush_graph_tracking() from the result metrics
  • ManagedAgentGraph._flush_graph_tracking() extended to drive per-node tracking from result.metrics.node_metrics using graph node tracker factories
  • Integration tests updated to exercise the full ManagedAgentGraph.run() pipeline (tracking events now come from the managed layer)
  • Handoff-level track_handoff_success() calls removed (per spec: path field is sufficient; handoffs are not in GraphMetrics)

Depends on

Test plan

  • All existing tests pass (uv run pytest packages/ai-providers/server-ai-openai/tests/)
  • test_openai_agent_graph_runner.py: runner returns new shape, no tracker created
  • test_tracking_openai_agents.py: graph-level and per-node events emitted through managed layer

🤖 Generated with Claude Code

@jsonbailey jsonbailey force-pushed the jb/aic-2174/graph-tracking-refactor branch from fcbcb18 to 9286d53 Compare April 29, 2026 13:15
@jsonbailey jsonbailey force-pushed the jb/aic-2174/openai-graph-runner branch from 9733a28 to 44501e3 Compare April 29, 2026 13:15
@jsonbailey jsonbailey force-pushed the jb/aic-2174/graph-tracking-refactor branch from 9286d53 to 72fc13e Compare April 29, 2026 13:19
@jsonbailey jsonbailey force-pushed the jb/aic-2174/openai-graph-runner branch from 44501e3 to 142e041 Compare April 29, 2026 13:19
@jsonbailey jsonbailey force-pushed the jb/aic-2174/graph-tracking-refactor branch from 72fc13e to bde4f09 Compare April 29, 2026 13:22
@jsonbailey jsonbailey force-pushed the jb/aic-2174/openai-graph-runner branch from 142e041 to fb3c0f6 Compare April 29, 2026 13:22
@jsonbailey jsonbailey force-pushed the jb/aic-2174/graph-tracking-refactor branch from bde4f09 to c376011 Compare April 29, 2026 13:52
@jsonbailey jsonbailey force-pushed the jb/aic-2174/openai-graph-runner branch from fb3c0f6 to b3547b0 Compare April 29, 2026 13:52
@jsonbailey jsonbailey force-pushed the jb/aic-2174/graph-tracking-refactor branch from c376011 to 7f67e4f Compare April 29, 2026 13:57
@jsonbailey jsonbailey force-pushed the jb/aic-2174/openai-graph-runner branch from b3547b0 to 1d4ddb2 Compare April 29, 2026 13:57
@jsonbailey jsonbailey force-pushed the jb/aic-2174/graph-tracking-refactor branch from 7f67e4f to a89c6a2 Compare April 29, 2026 14:38
@jsonbailey jsonbailey force-pushed the jb/aic-2174/openai-graph-runner branch from 1d4ddb2 to 6201d09 Compare April 29, 2026 14:38
jsonbailey and others added 2 commits April 29, 2026 11:25
@jsonbailey jsonbailey force-pushed the jb/aic-2174/graph-tracking-refactor branch from a89c6a2 to c69a9ff Compare April 29, 2026 16:33
@jsonbailey jsonbailey force-pushed the jb/aic-2174/openai-graph-runner branch from 6201d09 to ef4216c Compare April 29, 2026 16:34
jsonbailey and others added 9 commits April 29, 2026 11:49
The new track_tool_calls method at line 413 (with summary storage and
dedup guard) was being shadowed by the older method at line 559 (which
only fired per-tool events). Merge them into a single method that both
stores to the summary and fires per-tool events.
Previously, metrics_extractor(result) was called twice — once in the
public track_metrics_of/track_metrics_of_async to read duration_ms,
and again inside _track_from_metrics_extractor to track success,
tokens, and tool calls. Extract metrics once in the public method and
pass the resulting metrics + elapsed_ms into the private helper, which
now also handles the duration tracking.
ManagedModel and ManagedAgent now require a Runner. The compat shims
(_invoke_runner, isinstance(result, RunnerResult) branches, Union
type annotations) are removed; result handling is direct on
RunnerResult fields.

The deprecated ManagedModel.invoke() is preserved for backwards compat
but now delegates to run() and adapts the ManagedResult into the legacy
ModelResponse shape.

ModelRunner and AgentRunner protocol definitions remain in place so
downstream provider packages that import them continue to work.
- Drop the inconsistent 'if metrics else None' guard on reported_ms;
  the next line already dereferences metrics.success unconditionally.
- Use 'is not None' for tool_calls so an explicit empty list still
  triggers tracking (preserves the distinction between 'not tracked'
  and 'tracked with no calls').
Drop the deprecated invoke() method from the managed layer along with
its dedicated test class and the warnings/LDAIMetrics/ModelResponse
imports that were only needed by it. Type definitions in providers/
remain so downstream provider packages keep building.
…unner]

The factory's downstream consumers (ManagedModel, ManagedAgent) now
take Runner; aligning the factory's return types lets us drop the
type: ignore comments at the ManagedModel/ManagedAgent call sites.
Provider package PRs will update their concrete implementations to
match.

Judge still takes ModelRunner, so its call site picks up the
type: ignore[arg-type] in its place — that's resolved later in the
cleanup PR when Judge migrates to Runner.
Move the metrics_extractor call inside _track_from_metrics_extractor
so extraction errors are caught and logged without bubbling up. When
extraction fails or returns None, only the wall-clock duration is
tracked — success/error is left untouched since the underlying model
call itself succeeded.

Also tighten the tool_calls check to access metrics.tool_calls
directly, mirroring how metrics.usage is accessed.
- Judge now accepts Runner instead of ModelRunner
- evaluate() calls runner.run(output_type=...) instead of invoke_structured_model
- response.parsed replaces StructuredResponse.data; None guard added
- evaluate_messages() accepts RunnerResult instead of ModelResponse
- Tests updated to use RunnerResult and mock_runner.run

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jsonbailey jsonbailey force-pushed the jb/aic-2174/graph-tracking-refactor branch from c69a9ff to 14cfa92 Compare April 30, 2026 14:03
@jsonbailey jsonbailey force-pushed the jb/aic-2174/openai-graph-runner branch from ef4216c to 8ecce16 Compare April 30, 2026 14:05
@jsonbailey jsonbailey force-pushed the jb/aic-2174/graph-tracking-refactor branch from 14cfa92 to 1ed1a44 Compare April 30, 2026 14:23
@jsonbailey jsonbailey force-pushed the jb/aic-2174/openai-graph-runner branch 2 times, most recently from 8ecce16 to 09af502 Compare April 30, 2026 14:23
jsonbailey and others added 7 commits April 30, 2026 09:41
…nnerResult

- OpenAIModelRunner.run() implements the unified Runner protocol; returns RunnerResult
  with content, metrics (LDAIMetrics), raw, and parsed fields. Structured output is
  supported via the output_type parameter.
- OpenAIAgentRunner.run() updated to return RunnerResult; populates tool_calls in
  LDAIMetrics from observed openai-agents ToolCallItems.
- Legacy invoke_model() and invoke_structured_model() retained as deprecated adapters
  that delegate to run() and wrap results into ModelResponse / StructuredResponse for
  backward compatibility.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nner

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… RunnerResult

- LangChainModelRunner.run() implements the unified Runner protocol; returns RunnerResult
  with content, metrics (LDAIMetrics), raw, and parsed fields. Structured output is
  supported via the output_type parameter.
- LangChainAgentRunner.run() updated to return RunnerResult; populates tool_calls in
  LDAIMetrics from observed tool_calls in message responses.
- Legacy invoke_model() and invoke_structured_model() retained as deprecated adapters
  that delegate to run() and wrap results into ModelResponse / StructuredResponse for
  backward compatibility.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rit Runner

- LangChainModelRunner: replaces invoke_model/invoke_structured_model with
  run(input, output_type=None); returns RunnerResult
- LangChainAgentRunner: replaces AgentResult with RunnerResult; run()
  signature gains optional output_type parameter
- Tests updated to call run() and assert result.content / result.parsed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rResult types

- Add GraphMetrics dataclass (runner-layer return type for graph runs)
- Add GraphMetricSummary dataclass (managed-layer metrics, analogous to
  LDAIMetricSummary for single-model invocations)
- Add ManagedGraphResult dataclass (managed-layer return type from ManagedAgentGraph)
- Add AgentGraphRunnerResult dataclass (future runner return type, no evaluations field)
- ManagedAgentGraph.run() now returns ManagedGraphResult with GraphMetricSummary
  built from the runner's AgentGraphResult metrics
- Export all new types from ldai package

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… new runner shape

ManagedAgentGraph.run() now detects the runner result type and dispatches
accordingly:
- AgentGraphRunnerResult (new shape): managed layer drives all graph-level
  tracking from result.metrics (path, duration, success/failure, total tokens)
  via the graph tracker. Node-level tracking from node_metrics will be wired
  once runners populate that field (PR 11-openai/langchain).
- AgentGraphResult (legacy shape): tracking already occurred inside the runner;
  managed layer wraps result without additional tracking.

ManagedAgentGraph now accepts an optional graph parameter (AgentGraphDefinition)
used to create the graph tracker. LDAIClient.create_agent_graph() passes the
resolved graph definition. This is a deliberate bridge pattern: the legacy
detection branch will be removed once both runners are migrated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jsonbailey jsonbailey force-pushed the jb/aic-2174/graph-tracking-refactor branch from 1ed1a44 to f016b0d Compare April 30, 2026 14:46
…h GraphMetrics

Remove all direct LaunchDarkly tracker calls from OpenAIAgentGraphRunner. The
runner now collects per-node metrics via _NodeMetricsAccumulator (a lightweight
accumulator replacing the per-node LDAIConfigTracker) and returns
AgentGraphRunnerResult with populated GraphMetrics (path, duration_ms, usage,
node_metrics). Graph-level and per-node tracking events are emitted by
ManagedAgentGraph._flush_graph_tracking() from the result.

ManagedAgentGraph._flush_graph_tracking() is extended to also drive per-node
tracking from result.metrics.node_metrics using the graph definition's node
tracker factories.

Integration tests in test_tracking_openai_agents.py are updated to run through
the full ManagedAgentGraph pipeline (ManagedAgentGraph.run()) so tracking events
are emitted by the managed layer as intended.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jsonbailey jsonbailey force-pushed the jb/aic-2174/openai-graph-runner branch from 09af502 to 43bc879 Compare April 30, 2026 14:47
@jsonbailey jsonbailey force-pushed the jb/aic-2174/graph-tracking-refactor branch 2 times, most recently from 76b9580 to 7f0642e Compare May 4, 2026 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant